Blending Propensity Score Matching and Synthetic Minority Over-sampling Technique for Imbalanced Classification
نویسندگان
چکیده
Real world data sets often contain disproportionate sample sizes of observed groups making the task of prediction algorithms very difficult. One of the many ways to combat inherit bias from class imbalance data is to perform re-sampling. In this paper we discuss two popular re-sampling approaches proposed in literature, Synthetic Minority Over-sampling Technique (SMOTE) and Propensity Score Matching (PSM) as well as a novel approach referred to as Over-sampling Using Propensity Scores (OUPS). Using simulation we conduct experiments that result in statistical improvement in accuracy and sensitivity by using OUPS over both SMOTE and PSM
منابع مشابه
Learning Classifiers from Imbalanced, Only Positive and Unlabeled Data Sets
In this report, I presented my results to the tasks of 2008 UC San Diego Data Mining Contest. This contest consists of two classification tasks based on data from scientific experiment. The first task is a binary classification task which is to maximize accuracy of classification on an evenly-distributed test data set, given a fully labeled imbalanced training data set. The second task is also ...
متن کاملBorderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning
In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem. Synthetic minority oversampling technique (S...
متن کاملAn Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique
Abstract—Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalance...
متن کاملCUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...
متن کاملAn Impact Estimator Using Propensity Score Matching: People’s Business Credit Program to Micro Entrepreneurs in Indonesia
P eople’s business credit program (KUR) has been launched to alleviate poverty through provision of micro financing to micro entrepreneurs in Indonesia This study aims to estimate the impact of KUR program using cross-sectional data and propensity score matching technique (PSM). The survey was conducted on 332 household entrepreneurs, consisting of 155 KUR receivers and 177 non-KUR r...
متن کامل